Imagine that you’re an environmental scientist interested in predicting daily levels of atmospheric ozone pollution in Los Angeles. Ozone is an allotrope of oxygen molecule that has three oxygen atoms instead of two. While ozone in the stratosphere protects us from the sun’s UV rays, products from burning fossil fuels can be converted into ozone at ground level, where it is toxic. Our job is to build a regression model that can predict ozone pollution levels based on the time of year & meteorological readings, such as humidity & temperature.
Let’s load the data & explore it.
data(Ozone, package = 'mlbench')
ozoneTib <- as_tibble(Ozone)
names(ozoneTib) <- c('Month', 'Date', 'Day', 'Ozone', 'Press_height', 'Wind', 'Humid', 'Temp_Sand', 'Temp_Monte', 'Inv_height', 'Press_grad', 'Inv_temp', 'Visib')
ozoneTib
## # A tibble: 366 × 13
## Month Date Day Ozone Press_height Wind Humid Temp_Sand Temp_Monte
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 4 3 5480 8 20 NA NA
## 2 1 2 5 3 5660 6 NA 38 NA
## 3 1 3 6 3 5710 4 28 40 NA
## 4 1 4 7 5 5700 3 37 45 NA
## 5 1 5 1 5 5760 3 51 54 45.3
## 6 1 6 2 6 5720 4 69 35 49.6
## 7 1 7 3 4 5790 6 19 45 46.4
## 8 1 8 4 4 5790 3 25 55 52.7
## 9 1 9 5 6 5700 3 73 41 48.0
## 10 1 10 6 7 5700 3 59 44 NA
## # … with 356 more rows, and 4 more variables: Inv_height <dbl>,
## # Press_grad <dbl>, Inv_temp <dbl>, Visib <dbl>
We have a tibble containing 366 cases & 13 variables of daily meteorological & ozone readings. At present, the Month, Day, & Date variables are factors. Arguably this may make sense, but we’re going to treat them as numerics for this exercise. To do this, we use the handy mutate_all() function, which takes the data as the first argument & a transformation/function as the second argument. Here, we use as.numeric() to convert all the variables into the numeric class.
Next, we have some missing data in the data set. Missing data is okay in our predictor variables (we’ll deal with this later using imputation), but missing data fro the variable we’re trying to predict is not okay. Therefore, we remove the cases without any ozone measurement by piping the result of the mutate_all() call into the filter() function, where we remove cases with an NA value for Ozone.
ozoneClean <- mutate_all(ozoneTib, as.numeric) %>%
filter(is.na(Ozone) == FALSE)
ozoneClean
## # A tibble: 361 × 13
## Month Date Day Ozone Press_height Wind Humid Temp_Sand Temp_Monte
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 4 3 5480 8 20 NA NA
## 2 1 2 5 3 5660 6 NA 38 NA
## 3 1 3 6 3 5710 4 28 40 NA
## 4 1 4 7 5 5700 3 37 45 NA
## 5 1 5 1 5 5760 3 51 54 45.3
## 6 1 6 2 6 5720 4 69 35 49.6
## 7 1 7 3 4 5790 6 19 45 46.4
## 8 1 8 4 4 5790 3 25 55 52.7
## 9 1 9 5 6 5700 3 73 41 48.0
## 10 1 10 6 7 5700 3 59 44 NA
## # … with 351 more rows, and 4 more variables: Inv_height <dbl>,
## # Press_grad <dbl>, Inv_temp <dbl>, Visib <dbl>
Note: Could we have imputed missing data in our target variable? Yes we could, but this has the potential to introduce bias into our model. This is because we’ll be training a model to predict values that were themselves generated by a model.
Let’s plot each of our predictor variables against Ozone to get an idea of the relationships in the data. We start with our usual trick of gathering the variables with the gather() function so we can plot them on separate facets.
ozoneUntidy <- gather(ozoneClean, key = 'Variable', value = 'Value', -Ozone)
ggplotly(
ggplot(ozoneUntidy, aes(Value, Ozone)) +
facet_wrap(~Variable, scale = 'free_x') +
geom_point(size = 0.5) + geom_smooth() +
geom_smooth(method = 'lm', col = 'red') +
theme_bw()
)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
In our ggplot() call, we facet by Variable & allow the x-axes of the facets to vary by setting the scale argument equal to 'free_x'. Then, along with a geom_point() layer, we add two geom_smooth() layers. The first geom_smooth() is given no arguments & so uses the default settings. By default, geom_smooth() will draw a LOESS curve to the data (a curvy, local regression line) if there are fewwer than 1,000 cases, or a GAM curve if there are 1,000 or more cases. Either will give us an idea of the shape of the relationships. The second geom_smooth() layer specifically asks for the lm method (linear model), which draws a linear regression line that best fits the data. Drawing both of these will help us identify if there are relationships in the data that are nonlinear.
The resulting plot is shown above. Some of the predictors have a linear relationship with ozone levels, some have a nonlinear relationship, & some seem to have no relationship at all.
Linear regression cannot handling missing values. Therefore, to avoid having to throw away a large portion of our data set, we’ll use imputation to fill in the gaps. In previous lessons, we used mean imputation to replace missing values (NAs) with the mean of the variable. While this may work, it only uses the information within a single variable will take the same value, potentially biasing the model. Instead, we can actually use machine learning algorithms to predict the value of a missing observation, using all of the other variables in the data set.
There are multiple imputation methods that come packaged with mlr. These include methods such as imputeMean(), imputeMedian(), & imputeMode() (for replacing missing values with the mean, median & mode of each variable, respectively). However, the most important method is imputeLearner(). The imputeLearner() function lets us specify a supervised machine learning algorithm to predict what the missing values would have been, based on the information held in all the other variables. For example, if we want to impute missing values of a continuous variable, the process proceeds as follows:
We employ the same strategy when imputing categorical variables, except that we choose a classification algorithm instead of a regression one. So we end up using a supervised learning algorithm to fill in the blanks so that we can use another algorithm to train our final model.
So how do we choose an imputation algorithm? There are a few practical considerations, but as always it will depend & it may pay off to try different methods & see which one gives you the best performance. We can at least initially whittle it down to either a classification or regression algorithm, depending on whether the variable with missing values is continuous or categorical. Next, whether we have missing values in one or multiple variables makes a difference because if it’s the latter, we will need to choose an algorithm that can itslef handle missing values. For example, let’s say we try to use logistic regression to impute missing values of a categorical variable. We’ll get to step 3 of our procedure & stop because the other variables in the data (that the algorithm is trying to use to predict the categorical variable) also contain missing values. Logistic regression can’t handle that & will throw an error. If the only variable with missing values was the one we were trying to impute, this wouldn’t be a problem. Finally, the other consideration is computational budget. If the algorithm you’re using to learn the final model is already computationally expensive, using a computationally expensive algorithm to impute our missing values is added expense. Within these constraints, it’s often best to experiment with different imputation learner & see which one works best for the task at hand.
When doing any form of missing-value imputation, it’s extremely important to ensure that the data is either missing at random (MAR) or missing completely at random (MCAR) & not missing not at random (MNAR). If the data is MCAR, it means the likelihood of a missing value is not related to any variable in the data set. If the data is MAR, it means the likelihood of a missing value is related only to the value of the other variables in the data set. For example, someone might be less likely to fill in their salary on a form because of their age. In either of these situations, we can still build models that are unbiased due to the presence of missing data. But consider a situation where someone is less likely to fill in their salary on a form because their salary is low. This is an example of data missing not at random (MNAR), where the likelihood of a missing value depends on the value of the variable itself. In such a situation, you would likely build a model that is biased to overestimate the salaries of the people in your survey.
How do we tell if our data is MCAR, MAR, or MNAR? There are methods for distinguishing MCAR & MAR. For example, you could build a classification model that predicts whether a case has a missing value for a particular variable. If the model does better at predicting missing values than a random guess, then the data is MAR. If the model can’t do much better than a random guess, then the data is probably MCAR. Is there a way to tell whether data is MNAR? Unfortunately, no. Being confident your data is not MNAR depends on good experiment design & thoughtful examination of your predictor variables.
Note: There is a more powerful imputation technique called multiple imputation where we create many new data sets, replacing missing data with sensible values in each one. We then train a model on each of these imputed data sets & return the average model. While this is the most widely used imputation technique, it isn’t implemented in mlr. Check out the documentation for the mice package.
For our ozone data, we have missing values across several variables, & they’re all continuous variables. Therefore, we’ll choose a regression algorithm that can handle missing data: rpart. We will impute the missing values with the rpart decision tree algorithm. In lessons we covered before, we considered tree based algorithms for classification problems, but decision trees can be used to predict continuous variables too.
imputeMethod <- imputeLearner('regr.rpart')
imputeMethod
## $learn
## function (data, target, col, learner, features)
## {
## constructor = getTaskConstructorForLearner(learner)
## if (is.null(features)) {
## features = setdiff(names(data), target)
## }
## else {
## not.ok = which(features %nin% names(data))
## if (length(not.ok)) {
## stopf("Features for imputation not found in data: '%s'",
## collapse(features[not.ok]))
## }
## not.ok = which.first(target %in% features)
## if (length(not.ok)) {
## stopf("Target column used as feature for imputation: '%s'",
## target[not.ok])
## }
## if (col %nin% features) {
## features = c(col, features)
## }
## }
## impute.feats = setdiff(features, col)
## if (anyMissing(data[impute.feats]) && !hasLearnerProperties(learner,
## "missings")) {
## has.na = vlapply(data[impute.feats], anyMissing)
## wrong.feats = clipString(collapse(colnames(data[impute.feats])[has.na],
## ", "), 50L)
## stopf("Feature(s) '%s' used for imputation has/have missing values, but learner '%s' does not support that!",
## wrong.feats, learner$id)
## }
## ind = !is.na(data[[col]])
## task = constructor("impute", data = subset(data, subset = ind,
## select = features), target = col, check.data = TRUE,
## fixup.data = "quiet")
## list(model = train(learner, task), features = features)
## }
## <bytecode: 0x7f85f31af638>
## <environment: 0x7f85f3774fd8>
##
## $impute
## function (data, target, col, model, features)
## {
## x = data[[col]]
## ind = is.na(x)
## if (all(!ind)) {
## return(x)
## }
## newdata = as.data.frame(data)[ind, features, drop = FALSE]
## p = predict(model, newdata = newdata)$data$response
## replace(x, ind, p)
## }
## <bytecode: 0x7f85f37721d8>
## <environment: 0x7f85f3774fd8>
##
## $args
## $args$learner
## Learner regr.rpart from package rpart
## Type: regr
## Name: Decision Tree; Short name: rpart
## Class: regr.rpart
## Properties: missings,numerics,factors,ordered,weights,featimp
## Predict-Type: response
## Hyperparameters: xval=0
##
##
## $args$features
## NULL
##
##
## attr(,"class")
## [1] "ImputeMethod"
ozoneImp <- impute(as.data.frame(ozoneClean), classes = list(numeric = imputeMethod))
ozoneImp
## $data
## Month Date Day Ozone Press_height Wind Humid Temp_Sand Temp_Monte
## 1 1 1 4 3 5480.000 8 20.00000 33.00000 33.98000
## 2 1 2 5 3 5660.000 6 31.00000 38.00000 51.27180
## 3 1 3 6 3 5710.000 4 28.00000 40.00000 51.27180
## 4 1 4 7 5 5700.000 3 37.00000 45.00000 51.27180
## 5 1 5 1 5 5760.000 3 51.00000 54.00000 45.32000
## 6 1 6 2 6 5720.000 4 69.00000 35.00000 49.64000
## 7 1 7 3 4 5790.000 6 19.00000 45.00000 46.40000
## 8 1 8 4 4 5790.000 3 25.00000 55.00000 52.70000
## 9 1 9 5 6 5700.000 3 73.00000 41.00000 48.02000
## 10 1 10 6 7 5700.000 3 59.00000 44.00000 51.27180
## 11 1 11 7 4 5770.000 8 27.00000 54.00000 51.27180
## 12 1 12 1 6 5720.000 3 44.00000 51.00000 54.32000
## 13 1 13 2 5 5760.000 6 33.00000 51.00000 57.56000
## 14 1 14 3 4 5780.000 6 19.00000 54.00000 56.12000
## 15 1 15 4 4 5830.000 3 19.00000 58.00000 62.24000
## 16 1 16 5 7 5870.000 2 19.00000 61.00000 64.94000
## 17 1 17 6 5 5840.000 5 19.00000 64.00000 59.67714
## 18 1 18 7 9 5780.000 4 59.00000 67.00000 59.67714
## 19 1 19 1 4 5680.000 5 73.00000 52.00000 56.48000
## 20 1 20 2 3 5720.000 4 19.00000 54.00000 51.27180
## 21 1 21 3 4 5760.000 3 19.00000 54.00000 53.60000
## 22 1 22 4 4 5730.000 4 26.00000 58.00000 52.70000
## 23 1 23 5 5 5700.000 5 59.00000 69.00000 51.08000
## 24 1 24 6 6 5650.000 5 70.00000 51.00000 51.27180
## 25 1 25 7 9 5680.000 3 64.00000 53.00000 51.27180
## 26 1 26 1 5 5780.000 3 70.37500 56.00000 53.60000
## 27 1 27 2 6 5820.000 5 19.00000 59.00000 59.36000
## 28 1 28 3 6 5830.000 4 21.17949 59.00000 60.08000
## 29 1 29 4 6 5810.000 5 19.00000 64.00000 56.66000
## 30 1 30 5 11 5790.000 3 28.00000 63.00000 57.38000
## 31 1 31 6 10 5800.000 2 32.00000 63.00000 59.67714
## 32 2 1 7 7 5820.000 5 19.00000 62.00000 59.67714
## 33 2 2 1 12 5770.000 8 76.00000 63.00000 57.20000
## 34 2 3 2 9 5670.000 3 69.00000 54.00000 45.50000
## 35 2 4 3 2 5590.000 3 76.00000 36.00000 37.40000
## 36 2 5 4 3 5410.000 6 64.00000 31.00000 32.18000
## 37 2 6 5 3 5350.000 7 62.00000 30.00000 32.54000
## 38 2 7 6 2 5480.000 9 72.00000 36.00000 42.06000
## 39 2 8 7 3 5600.000 7 76.00000 42.00000 42.06000
## 40 2 9 1 3 5490.000 11 72.00000 37.00000 38.48000
## 41 2 10 2 4 5560.000 10 72.00000 41.00000 40.46000
## 42 2 11 3 6 5700.000 3 32.00000 46.00000 42.06000
## 43 2 12 4 8 5680.000 5 50.00000 51.00000 47.12000
## 44 2 13 5 6 5700.000 4 86.00000 55.00000 49.28000
## 45 2 14 6 4 5650.000 5 61.00000 41.00000 33.98000
## 46 2 15 7 3 5610.000 5 62.00000 41.00000 42.06000
## 47 2 16 1 7 5730.000 5 66.00000 49.00000 51.27180
## 48 2 17 2 11 5770.000 5 68.00000 45.00000 52.88000
## 49 2 18 3 13 5770.000 3 82.00000 55.00000 55.40000
## 50 2 19 4 4 5700.000 5 67.94444 45.00000 38.12000
## 51 2 20 5 6 5690.000 8 21.00000 41.00000 43.88000
## 52 2 21 6 5 5700.000 3 19.00000 45.00000 42.06000
## 53 2 22 7 4 5730.000 11 19.00000 51.00000 51.27180
## 54 2 23 1 4 5690.000 7 19.00000 53.00000 50.18000
## 55 2 24 2 6 5640.000 5 68.00000 50.00000 37.40000
## 56 2 25 3 10 5720.000 6 63.00000 60.00000 53.06000
## 57 2 26 4 15 5740.000 3 54.00000 54.00000 56.48000
## 58 2 27 5 23 5740.000 3 47.00000 53.00000 58.82000
## 59 2 28 6 17 5740.000 3 56.00000 53.00000 59.67714
## 60 2 29 7 7 5670.000 7 61.00000 44.00000 51.27180
## 61 3 1 1 2 5550.000 10 74.00000 40.00000 38.84000
## 62 3 2 2 3 5470.000 7 46.00000 30.00000 29.66000
## 63 3 3 3 3 5320.000 11 45.00000 25.00000 27.68000
## 64 3 4 4 5 5565.909 8 33.00000 39.00000 30.20000
## 65 3 5 5 4 5530.000 3 43.00000 40.00000 36.14000
## 66 3 6 6 6 5600.000 3 21.00000 45.00000 42.06000
## 67 3 7 7 7 5660.000 7 57.00000 51.00000 42.06000
## 68 3 8 1 7 5580.000 5 42.00000 48.00000 40.64000
## 69 3 9 2 6 5510.000 5 50.00000 45.00000 36.86000
## 70 3 10 3 3 5530.000 5 61.00000 47.00000 33.80000
## 71 3 11 4 2 5620.000 9 61.00000 43.00000 37.04000
## 72 3 12 5 8 5690.000 0 60.00000 49.00000 46.04000
## 73 3 13 6 12 5760.000 4 31.00000 56.00000 59.67714
## 74 3 14 7 12 5740.000 3 66.00000 53.00000 59.67714
## 75 3 15 1 16 5780.000 5 53.00000 61.00000 57.92000
## 76 3 16 2 9 5790.000 2 42.00000 63.00000 57.02000
## 77 3 17 3 24 5760.000 3 60.00000 70.00000 58.64000
## 78 3 18 4 13 5700.000 4 82.00000 57.00000 50.36000
## 79 3 19 5 8 5680.000 4 57.00000 35.00000 40.10000
## 80 3 20 6 10 5720.000 5 21.00000 52.00000 59.67714
## 81 3 21 7 8 5720.000 5 19.00000 59.00000 59.67714
## 82 3 22 1 9 5730.000 4 32.00000 67.00000 59.54000
## 83 3 23 2 10 5710.000 5 77.00000 57.00000 57.38000
## 84 3 24 3 13 5750.000 6 70.00000 54.69697 56.30000
## 85 3 25 4 14 5720.000 4 71.00000 42.00000 44.96000
## 86 3 26 5 9 5710.000 3 19.00000 55.00000 51.80000
## 87 3 27 6 11 5600.000 6 45.00000 40.00000 42.06000
## 88 3 28 7 7 5630.000 4 44.00000 39.00000 51.27180
## 89 3 29 1 9 5690.000 7 70.00000 57.00000 46.58000
## 90 3 30 2 12 5730.000 6 45.00000 58.00000 52.52000
## 91 3 31 3 12 5710.000 3 46.00000 62.00000 52.52000
## 92 4 1 4 8 5610.000 6 50.00000 51.00000 50.00000
## 93 4 2 5 9 5680.000 5 69.00000 61.00000 51.44000
## 94 4 3 6 5 5620.000 6 67.00000 34.00000 33.98000
## 95 4 4 7 4 5420.000 7 69.00000 35.00000 33.98000
## 96 4 5 1 4 5540.000 5 54.00000 35.00000 33.26000
## 97 4 6 2 9 5590.000 6 51.00000 48.00000 38.12000
## 98 4 7 3 13 5690.000 6 63.00000 59.00000 52.88000
## 99 4 8 4 5 5550.000 7 63.00000 41.00000 37.58000
## 100 4 9 5 10 5620.000 7 57.00000 58.00000 46.76000
## 101 4 10 6 10 5630.000 6 61.00000 51.00000 51.27180
## 102 4 11 7 7 5580.000 7 78.00000 46.00000 42.06000
## 103 4 12 1 5 5560.000 4 65.00000 40.00000 34.70000
## 104 4 13 2 4 5440.000 5 44.00000 35.00000 33.08000
## 105 4 14 3 7 5480.000 7 51.00000 46.00000 37.40000
## 106 4 15 4 3 5620.000 5 73.00000 39.00000 39.56000
## 107 4 16 5 4 5450.000 11 35.00000 32.00000 33.98000
## 108 4 17 6 7 5660.000 6 35.00000 47.00000 42.06000
## 109 4 18 7 11 5680.000 6 61.00000 50.00000 51.27180
## 110 4 19 1 15 5760.000 4 50.00000 65.00000 56.30000
## 111 4 20 2 22 5790.000 4 57.00000 66.00000 63.68000
## 112 4 21 3 17 5720.000 5 68.00000 69.00000 60.80000
## 113 4 22 4 7 5660.000 6 58.00000 59.00000 42.80000
## 114 4 23 5 10 5710.000 5 65.00000 64.00000 56.30000
## 115 4 24 6 19 5780.000 7 78.00000 68.00000 59.67714
## 116 4 25 7 18 5750.000 7 73.00000 49.00000 59.67714
## 117 4 26 1 12 5700.000 5 41.00000 52.00000 49.64000
## 118 4 27 2 6 5620.000 9 47.00000 56.00000 39.92000
## 119 4 28 3 9 5650.000 6 46.00000 55.00000 40.82000
## 120 4 29 4 19 5730.000 5 61.00000 66.00000 59.72000
## 121 4 30 5 21 5810.000 4 55.00000 74.00000 67.28000
## 122 5 1 6 29 5790.000 4 60.00000 76.00000 69.56115
## 123 5 2 7 16 5740.000 8 78.00000 70.00000 59.67714
## 124 5 3 1 5 5667.143 8 62.00000 61.00000 44.42000
## 125 5 4 2 11 5690.000 4 71.00000 67.00000 52.52000
## 126 5 5 3 2 5565.909 4 67.00000 45.00000 41.72000
## 127 5 6 4 2 5680.000 6 77.00000 41.00000 42.98000
## 128 5 7 5 12 5650.000 8 66.00000 61.00000 51.80000
## 129 5 8 6 16 5730.000 6 74.00000 68.00000 59.67714
## 130 5 9 7 22 5730.000 3 78.00000 69.00000 59.67714
## 131 5 10 1 20 5760.000 7 78.00000 74.00000 63.14000
## 132 5 11 2 27 5830.000 6 75.00000 74.00000 67.28000
## 133 5 12 3 33 5880.000 3 80.00000 80.00000 73.04000
## 134 5 13 4 25 5890.000 6 88.00000 84.00000 73.22000
## 135 5 14 5 31 5850.000 4 76.00000 78.00000 71.24000
## 136 5 15 6 18 5820.000 6 63.00000 80.00000 69.56115
## 137 5 16 7 16 5816.154 7 68.00000 73.00000 69.56115
## 138 5 17 1 24 5800.000 7 78.00000 76.00000 69.56115
## 139 5 18 2 16 5740.000 3 74.00000 74.00000 69.56115
## 140 5 19 3 12 5710.000 7 63.00000 66.00000 51.27180
## 141 5 20 4 9 5720.000 8 62.00000 66.00000 51.27180
## 142 5 21 5 12 5710.000 7 69.00000 63.00000 51.27180
## 143 5 22 6 16 5740.000 5 53.00000 69.00000 51.27180
## 144 5 24 1 8 5690.000 9 62.00000 62.00000 51.27180
## 145 5 25 2 9 5730.000 5 71.00000 67.00000 49.82000
## 146 5 26 3 29 5780.000 3 68.00000 80.00000 69.56115
## 147 5 27 4 20 5790.000 7 79.00000 76.00000 69.56115
## 148 5 28 5 5 5750.000 3 76.00000 65.00000 51.08000
## 149 5 29 6 5 5680.000 6 71.00000 65.00000 51.27180
## 150 5 30 7 11 5720.000 3 66.00000 63.00000 51.27180
## 151 5 31 1 12 5770.000 4 81.00000 62.00000 59.67714
## 152 6 1 2 19 5800.000 4 72.00000 68.00000 59.67714
## 153 6 2 3 17 5780.000 8 92.00000 68.00000 64.76000
## 154 6 3 4 19 5740.000 6 71.00000 69.00000 64.40000
## 155 6 4 5 16 5730.000 6 64.00000 66.00000 60.44000
## 156 6 5 6 14 5760.000 6 68.00000 70.00000 59.67714
## 157 6 6 7 10 5770.000 7 59.00000 70.00000 59.67714
## 158 6 7 1 9 5690.000 8 67.00000 64.00000 54.50000
## 159 6 8 2 7 5650.000 6 66.00000 61.00000 53.78000
## 160 6 9 3 5 5610.000 3 61.00000 52.00000 42.08000
## 161 6 10 4 2 5570.000 9 81.00000 48.00000 41.72000
## 162 6 11 5 12 5690.000 5 63.00000 59.00000 51.80000
## 163 6 12 6 22 5760.000 3 58.00000 67.00000 59.67714
## 164 6 13 7 17 5810.000 5 68.00000 66.00000 59.67714
## 165 6 14 1 26 5830.000 4 71.00000 74.00000 71.78000
## 166 6 15 2 27 5880.000 6 67.00000 83.00000 72.50000
## 167 6 16 3 14 5860.000 3 64.00000 78.00000 68.72000
## 168 6 17 4 11 5830.000 6 64.00000 75.00000 66.20000
## 169 6 18 5 23 5870.000 4 69.00000 84.00000 74.12000
## 170 6 19 6 26 5860.000 3 77.00000 81.00000 69.56115
## 171 6 20 7 21 5800.000 3 61.00000 79.00000 69.56115
## 172 6 21 1 15 5800.000 4 69.00000 79.00000 66.20000
## 173 6 22 2 20 5770.000 5 64.00000 65.00000 65.12000
## 174 6 23 3 15 5860.000 4 33.00000 81.00000 72.68000
## 175 6 24 4 18 5870.000 7 38.00000 84.00000 76.10000
## 176 6 25 5 26 5870.000 4 54.00000 83.00000 69.56115
## 177 6 26 6 19 5860.000 6 39.00000 90.00000 78.82250
## 178 6 27 7 13 5880.000 5 43.00000 90.00000 78.82250
## 179 6 28 1 30 5870.000 7 55.00000 93.00000 78.82250
## 180 6 29 2 26 5860.000 4 77.00000 88.00000 78.82250
## 181 6 30 3 15 5830.000 5 63.00000 72.00000 69.56115
## 182 7 1 4 16 5820.000 5 65.00000 72.00000 69.56115
## 183 7 2 5 16 5820.000 8 64.00000 70.00000 59.67714
## 184 7 3 6 19 5860.000 6 68.00000 78.00000 69.56115
## 185 7 4 7 23 5870.000 3 76.00000 87.00000 78.82250
## 186 7 5 1 28 5890.000 6 71.00000 91.00000 78.82250
## 187 7 6 2 34 5900.000 6 86.00000 87.00000 81.68000
## 188 7 7 3 33 5890.000 5 65.00000 91.00000 78.82250
## 189 7 9 5 24 5910.000 4 73.00000 88.00000 79.88000
## 190 7 10 6 17 5900.000 5 69.00000 83.00000 69.56115
## 191 7 11 7 10 5860.000 3 64.00000 78.00000 69.56115
## 192 7 12 1 14 5830.000 3 63.00000 79.00000 70.88000
## 193 7 13 2 13 5850.000 9 72.00000 77.00000 69.44000
## 194 7 14 3 17 5830.000 6 82.00000 81.00000 73.22000
## 195 7 15 4 15 5816.154 6 83.00000 76.00000 72.32000
## 196 7 16 5 22 5810.000 8 69.00000 76.00000 67.64000
## 197 7 17 6 19 5830.000 4 74.00000 78.00000 69.56115
## 198 7 18 7 20 5830.000 5 69.00000 75.00000 69.56115
## 199 7 19 1 25 5840.000 7 72.00000 82.00000 68.00000
## 200 7 20 2 28 5870.000 6 73.00000 84.00000 74.12000
## 201 7 21 3 29 5870.000 4 90.00000 86.00000 74.48000
## 202 7 23 5 23 5860.000 3 80.00000 80.00000 67.28000
## 203 7 24 6 26 5900.000 3 73.00000 80.00000 69.56115
## 204 7 25 7 14 5890.000 4 71.00000 84.00000 69.56115
## 205 7 26 1 13 5880.000 4 78.00000 84.00000 70.70000
## 206 7 27 2 26 5890.000 6 80.00000 81.00000 69.80000
## 207 7 28 3 22 5870.000 8 74.00000 85.00000 71.42000
## 208 7 29 4 11 5816.154 6 70.00000 79.00000 67.82000
## 209 7 30 5 15 5816.154 6 71.00000 72.00000 66.02000
## 210 7 31 6 14 5820.000 6 63.00000 73.00000 69.56115
## 211 8 1 7 13 5780.000 6 57.00000 72.00000 69.56115
## 212 8 2 1 9 5770.000 3 55.00000 68.00000 62.60000
## 213 8 3 2 12 5790.000 4 65.00000 65.00000 56.48000
## 214 8 4 3 15 5820.000 6 69.57459 64.00000 62.06000
## 215 8 5 4 12 5840.000 6 69.57459 75.00000 65.12000
## 216 8 6 5 15 5800.000 7 69.57459 69.00000 63.32000
## 217 8 7 6 25 5830.000 4 69.57459 69.00000 59.67714
## 218 8 8 7 18 5800.000 3 69.57459 72.00000 69.56115
## 219 8 9 1 14 5840.000 7 65.00000 79.00000 67.10000
## 220 8 10 2 22 5816.154 3 74.00000 78.00000 66.02000
## 221 8 11 3 24 5910.000 5 72.00000 81.00000 70.88000
## 222 8 12 4 19 5890.000 5 79.00000 80.00000 69.80000
## 223 8 13 5 16 5870.000 6 62.00000 76.00000 69.56115
## 224 8 14 6 7 5780.000 7 65.00000 59.00000 51.27180
## 225 8 15 7 2 5730.000 5 77.00000 55.00000 51.27180
## 226 8 16 1 4 5780.000 7 70.00000 66.00000 51.44000
## 227 8 17 2 6 5750.000 7 58.00000 64.00000 56.48000
## 228 8 18 3 12 5760.000 5 58.00000 62.00000 52.16000
## 229 8 19 4 9 5730.000 7 72.00000 67.00000 51.27180
## 230 8 20 5 15 5730.000 5 77.00000 74.00000 64.04000
## 231 8 21 6 17 5790.000 4 57.00000 74.00000 69.56115
## 232 8 22 7 13 5750.000 3 67.00000 70.00000 59.67714
## 233 8 23 1 20 5880.000 3 73.00000 77.00000 69.56115
## 234 8 24 2 22 5890.000 7 70.00000 83.00000 70.88000
## 235 8 25 3 24 5880.000 4 73.00000 81.00000 73.76000
## 236 8 26 4 26 5870.000 7 73.00000 73.00000 75.20000
## 237 8 27 5 32 5900.000 6 71.00000 87.00000 76.46000
## 238 8 28 6 33 5920.000 4 77.00000 89.00000 78.82250
## 239 8 29 7 27 5930.000 3 68.00000 92.00000 78.82250
## 240 8 30 1 38 5950.000 5 62.00000 92.00000 82.40000
## 241 8 31 2 23 5950.000 8 61.00000 93.00000 81.68000
## 242 9 1 3 19 5900.000 5 71.00000 93.00000 82.58000
## 243 9 2 4 19 5890.000 8 77.00000 86.00000 71.42000
## 244 9 3 5 15 5860.000 7 71.00000 76.00000 69.44000
## 245 9 4 6 28 5840.000 5 67.00000 81.00000 69.56115
## 246 9 5 7 10 5800.000 6 74.00000 78.00000 69.56115
## 247 9 6 1 14 5760.000 7 65.00000 73.00000 69.56115
## 248 9 7 2 26 5810.000 6 82.00000 80.00000 71.78000
## 249 9 8 3 17 5850.000 4 67.00000 81.00000 70.88000
## 250 9 9 4 3 5773.529 5 73.00000 69.00000 66.92000
## 251 9 10 5 2 5773.529 6 74.00000 59.00000 61.88000
## 252 9 11 6 3 5760.000 7 87.00000 52.00000 51.27180
## 253 9 12 7 14 5860.000 4 71.00000 63.00000 59.67714
## 254 9 13 1 29 5830.000 5 77.00000 72.00000 68.72000
## 255 9 14 2 18 5840.000 5 78.00000 75.00000 69.26000
## 256 9 15 3 3 5800.000 7 72.00000 55.00000 54.32000
## 257 9 16 4 7 5667.143 10 67.00000 59.00000 51.44000
## 258 9 17 5 9 5790.000 3 71.00000 61.00000 50.54000
## 259 9 18 6 19 5830.000 5 71.00000 71.00000 69.56115
## 260 9 19 7 8 5810.000 5 76.00000 71.00000 69.56115
## 261 9 21 2 23 5780.000 6 76.00000 72.00000 66.38000
## 262 9 22 3 13 5800.000 6 73.00000 75.00000 67.10000
## 263 9 24 5 7 5800.000 5 80.00000 65.00000 60.08000
## 264 9 25 6 3 5780.000 9 73.00000 61.00000 51.27180
## 265 9 26 7 5 5790.000 8 80.00000 60.00000 51.27180
## 266 9 27 1 11 5770.000 5 75.00000 64.00000 55.94000
## 267 9 28 2 12 5750.000 4 68.00000 61.00000 59.54000
## 268 9 29 3 5 5640.000 5 93.00000 63.00000 54.32000
## 269 9 30 4 4 5640.000 7 57.00000 62.00000 54.32000
## 270 10 1 5 5 5650.000 3 70.00000 59.00000 50.90000
## 271 10 2 6 4 5710.000 6 65.00000 56.00000 51.27180
## 272 10 3 7 10 5760.000 6 66.00000 59.00000 59.67714
## 273 10 4 1 17 5840.000 4 73.00000 72.00000 63.14000
## 274 10 5 2 26 5880.000 3 77.00000 71.00000 67.64000
## 275 10 6 3 30 5890.000 5 80.00000 75.00000 71.06000
## 276 10 7 4 18 5890.000 4 73.00000 71.00000 70.88000
## 277 10 8 5 12 5890.000 5 19.00000 71.00000 70.52000
## 278 10 9 6 7 5890.000 6 19.00000 73.00000 69.56115
## 279 10 10 7 15 5850.000 3 73.00000 78.00000 69.56115
## 280 10 11 1 12 5830.000 5 76.00000 73.00000 69.56115
## 281 10 12 2 7 5830.000 8 77.00000 71.00000 67.10000
## 282 10 13 3 28 5860.000 5 86.00000 73.00000 69.80000
## 283 10 14 4 22 5830.000 5 76.00000 71.00000 69.44000
## 284 10 15 5 18 5800.000 7 66.00000 66.00000 62.96000
## 285 10 16 6 14 5830.000 4 74.00000 69.00000 59.67714
## 286 10 17 7 24 5790.000 5 71.00000 69.00000 59.67714
## 287 10 18 1 10 5730.000 4 84.00000 64.00000 56.30000
## 288 10 19 2 14 5780.000 5 74.00000 65.00000 63.14000
## 289 10 20 3 9 5740.000 7 48.00000 54.00000 62.96000
## 290 10 21 4 12 5710.000 8 75.00000 62.00000 58.64000
## 291 10 22 5 7 5690.000 6 74.00000 56.00000 52.34000
## 292 10 23 6 7 5670.000 4 67.00000 55.00000 51.27180
## 293 10 24 7 6 5760.000 4 75.00000 58.00000 51.27180
## 294 10 25 1 13 5820.000 5 71.00000 48.00000 59.67714
## 295 10 26 2 5 5790.000 3 35.00000 54.00000 51.27180
## 296 10 27 3 3 5760.000 5 23.00000 57.00000 53.42000
## 297 10 28 4 7 5800.000 6 19.00000 60.00000 57.02000
## 298 10 29 5 8 5810.000 7 59.00000 61.00000 55.76000
## 299 10 30 6 10 5750.000 4 60.00000 63.00000 59.67714
## 300 10 31 7 12 5840.000 0 38.00000 65.00000 59.67714
## 301 11 1 1 7 5860.000 3 21.17949 66.00000 65.12000
## 302 11 2 2 5 5870.000 6 21.17949 68.00000 68.90000
## 303 11 3 3 6 5920.000 3 22.00000 71.00000 69.08000
## 304 11 4 4 4 5900.000 0 21.17949 70.00000 68.72000
## 305 11 5 5 5 5860.000 7 19.00000 70.00000 62.78000
## 306 11 6 6 11 5840.000 3 69.57459 70.00000 59.67714
## 307 11 7 7 20 5840.000 0 45.00000 68.00000 59.67714
## 308 11 8 1 4 5850.000 5 48.72727 64.00000 64.04000
## 309 11 9 2 14 5810.000 2 47.00000 69.00000 60.98000
## 310 11 10 3 16 5770.000 2 73.00000 59.00000 57.20000
## 311 11 11 4 5 5710.000 4 67.00000 49.00000 44.24000
## 312 11 12 5 3 5500.000 9 56.00000 39.00000 41.36000
## 313 11 13 6 5 5660.000 3 54.00000 50.00000 42.06000
## 314 11 14 7 1 5700.000 3 71.00000 46.00000 42.06000
## 315 11 15 1 5 5810.000 5 59.00000 54.00000 54.50000
## 316 11 16 2 4 5860.000 0 25.00000 60.00000 61.52000
## 317 11 17 3 11 5900.000 0 24.00000 62.00000 62.60000
## 318 11 18 4 6 5850.000 5 41.00000 65.00000 59.54000
## 319 11 19 5 8 5780.000 3 50.00000 66.00000 59.72000
## 320 11 20 6 14 5790.000 0 76.00000 66.00000 59.67714
## 321 11 21 7 18 5780.000 2 82.00000 63.00000 59.67714
## 322 11 22 1 12 5770.000 2 81.00000 62.00000 60.62000
## 323 11 23 2 9 5750.000 2 85.00000 60.00000 59.72000
## 324 11 24 3 7 5780.000 5 76.00000 63.00000 60.44000
## 325 11 25 4 14 5790.000 5 66.00000 60.00000 59.67714
## 326 11 26 5 4 5750.000 6 58.00000 58.00000 42.62000
## 327 11 27 6 3 5670.000 8 19.00000 34.00000 33.98000
## 328 11 28 7 3 5760.000 0 19.00000 36.00000 42.06000
## 329 11 29 1 3 5770.000 4 19.00000 44.00000 51.26000
## 330 11 30 2 3 5810.000 2 19.00000 53.00000 55.94000
## 331 12 1 3 3 5810.000 2 19.00000 52.00000 57.74000
## 332 12 2 4 3 5870.000 3 19.00000 53.00000 60.80000
## 333 12 3 5 3 5830.000 2 27.00000 58.00000 59.00000
## 334 12 4 6 6 5760.000 0 64.00000 55.00000 51.27180
## 335 12 5 7 6 5680.000 0 52.00000 50.00000 59.67714
## 336 12 6 1 5 5780.000 4 19.00000 48.00000 54.14000
## 337 12 7 2 3 5810.000 3 19.00000 51.00000 58.28000
## 338 12 8 3 4 5760.000 0 32.00000 62.00000 56.12000
## 339 12 9 4 7 5680.000 0 58.00000 40.00000 46.94000
## 340 12 10 5 5 5750.000 0 26.00000 44.00000 52.88000
## 341 12 11 6 5 5790.000 5 19.00000 49.00000 51.27180
## 342 12 12 7 4 5770.000 3 19.00000 53.00000 51.27180
## 343 12 13 1 3 5750.000 0 19.00000 53.00000 51.98000
## 344 12 14 2 2 5720.000 0 19.00000 53.00000 52.70000
## 345 12 15 3 5 5760.000 3 19.00000 55.00000 58.10000
## 346 12 16 4 3 5780.000 0 19.00000 51.00000 54.32000
## 347 12 17 5 4 5660.000 4 19.00000 54.00000 49.64000
## 348 12 18 6 4 5610.000 2 58.00000 48.00000 42.06000
## 349 12 19 7 6 5640.000 0 51.00000 53.00000 42.06000
## 350 12 20 1 6 5680.000 3 52.00000 49.00000 48.38000
## 351 12 21 2 3 5650.000 5 19.00000 48.00000 47.12000
## 352 12 22 3 4 5710.000 4 19.00000 51.00000 51.08000
## 353 12 23 4 3 5680.000 4 57.00000 47.00000 45.32000
## 354 12 24 5 8 5630.000 4 50.00000 50.00000 51.27180
## 355 12 25 6 5 5770.000 0 21.17949 49.00000 59.67714
## 356 12 26 7 3 5800.000 7 19.00000 51.00000 59.67714
## 357 12 27 1 2 5730.000 3 53.00000 51.00000 49.28000
## 358 12 28 2 3 5690.000 3 23.00000 51.00000 49.28000
## 359 12 29 3 5 5650.000 3 61.00000 50.00000 46.58000
## 360 12 30 4 1 5550.000 4 85.00000 39.00000 41.00000
## 361 12 31 5 2 5645.556 4 68.00000 37.00000 33.98000
## Inv_height Press_grad Inv_temp Visib
## 1 5000.000 -15.00 30.56000 200
## 2 4960.907 -14.00 38.24255 300
## 3 2693.000 -25.00 47.66000 250
## 4 590.000 -24.00 55.04000 100
## 5 1450.000 25.00 57.02000 60
## 6 1568.000 15.00 53.78000 60
## 7 2631.000 -33.00 54.14000 100
## 8 554.000 -28.00 64.76000 250
## 9 2083.000 23.00 52.52000 120
## 10 2654.000 -2.00 48.38000 120
## 11 5000.000 -19.00 48.56000 120
## 12 111.000 9.00 63.14000 150
## 13 492.000 -44.00 64.58000 40
## 14 5000.000 -44.00 56.30000 200
## 15 1249.000 -53.00 75.74000 250
## 16 5000.000 -67.00 65.48000 200
## 17 5000.000 -40.00 63.32000 200
## 18 639.000 1.00 66.02000 150
## 19 393.000 -68.00 69.80000 10
## 20 5000.000 -66.00 54.68000 140
## 21 5000.000 -58.00 51.98000 250
## 22 5000.000 -26.00 51.98000 200
## 23 3044.000 18.00 52.88000 150
## 24 3641.000 23.00 47.66000 140
## 25 111.000 -10.00 59.54000 50
## 26 692.000 -25.00 67.10000 0
## 27 597.000 -52.00 70.52000 70
## 28 4106.941 -44.00 62.58364 150
## 29 1791.000 -15.00 64.76000 150
## 30 793.000 -15.00 65.84000 120
## 31 531.000 -38.00 75.92000 40
## 32 419.000 -29.00 75.74000 120
## 33 816.000 -7.00 66.20000 6
## 34 3651.000 62.00 49.10000 30
## 35 5000.000 70.00 37.94000 100
## 36 5000.000 28.00 32.36000 200
## 37 1341.000 18.00 45.86000 60
## 38 5000.000 0.00 38.66000 350
## 39 3799.000 -18.00 45.86000 250
## 40 5000.000 32.00 38.12000 350
## 41 5000.000 -1.00 37.58000 300
## 42 5000.000 -30.00 45.86000 300
## 43 5000.000 -8.00 45.50000 300
## 44 2398.000 21.00 53.78000 200
## 45 5000.000 51.00 36.32000 100
## 46 4281.000 42.00 41.36000 250
## 47 1161.000 27.00 52.88000 200
## 48 2778.000 2.00 55.76000 200
## 49 442.000 26.00 58.28000 40
## 50 1235.280 82.00 38.24255 2
## 51 5000.000 -30.00 42.26000 300
## 52 5000.000 -53.00 43.88000 300
## 53 5000.000 -43.00 49.10000 300
## 54 5000.000 7.00 49.10000 300
## 55 5000.000 24.00 42.08000 300
## 56 1341.000 19.00 59.18000 150
## 57 1318.000 2.00 64.58000 150
## 58 885.000 -4.00 67.10000 80
## 59 360.000 3.00 67.10000 40
## 60 3497.000 73.00 49.46000 40
## 61 5000.000 73.00 40.10000 80
## 62 5000.000 44.00 29.30000 300
## 63 5000.000 39.00 27.50000 200
## 64 5000.000 15.00 30.02000 500
## 65 5000.000 -12.00 33.62000 140
## 66 5000.000 -2.00 39.02000 140
## 67 5000.000 30.00 42.08000 140
## 68 3608.000 24.00 39.38000 100
## 69 5000.000 38.00 32.90000 140
## 70 5000.000 56.00 35.60000 200
## 71 5000.000 66.00 34.34000 120
## 72 613.000 -27.00 59.72000 300
## 73 334.000 -9.00 64.40000 300
## 74 567.000 13.00 61.88000 150
## 75 488.000 -20.00 64.94000 2
## 76 531.000 -15.00 71.06000 50
## 77 508.000 7.00 66.56000 70
## 78 1571.000 68.00 56.30000 17
## 79 721.000 28.00 55.40000 140
## 80 505.000 -49.00 67.28000 140
## 81 377.000 -27.00 73.22000 300
## 82 442.000 -9.00 75.74000 200
## 83 902.000 54.00 60.44000 250
## 84 3188.000 53.00 58.64000 80
## 85 1381.000 4.00 56.30000 60
## 86 5000.000 -16.00 50.00000 100
## 87 5000.000 38.00 46.94000 150
## 88 1302.000 40.00 52.70000 150
## 89 1292.000 -5.00 53.60000 200
## 90 5000.000 -14.00 52.70000 100
## 91 472.000 34.00 62.96000 300
## 92 1404.000 42.00 54.50000 120
## 93 944.000 35.00 55.76000 100
## 94 5000.000 75.00 35.24000 200
## 95 5000.000 41.00 30.92000 200
## 96 5000.000 62.00 33.44000 200
## 97 5000.000 44.00 42.08000 300
## 98 2014.000 31.00 53.42000 300
## 99 5000.000 56.00 37.22000 250
## 100 5000.000 27.00 47.66000 120
## 101 524.000 57.00 54.68000 140
## 102 5000.000 55.00 38.48000 200
## 103 5000.000 59.00 35.24000 140
## 104 5000.000 24.00 32.54000 80
## 105 2490.000 29.00 47.48000 300
## 106 5000.000 107.00 31.28000 100
## 107 5000.000 36.00 33.44000 300
## 108 5000.000 28.00 39.38000 200
## 109 1144.000 30.00 53.60000 120
## 110 547.000 1.00 66.92000 100
## 111 413.000 10.00 69.62000 120
## 112 610.000 46.00 63.68000 60
## 113 3638.000 81.00 51.26000 120
## 114 3848.000 45.00 56.84000 100
## 115 1479.000 40.00 68.00000 100
## 116 1108.000 55.00 65.48000 27
## 117 869.000 0.00 58.10000 40
## 118 5000.000 43.00 38.30000 140
## 119 5000.000 49.00 37.94000 150
## 120 1148.000 31.00 60.80000 100
## 121 856.000 4.00 75.38000 100
## 122 807.000 16.00 73.04000 120
## 123 2040.000 46.00 63.50000 150
## 124 5000.000 63.00 42.62000 100
## 125 314.000 60.00 59.00000 120
## 126 5000.000 77.00 42.62000 80
## 127 5000.000 75.00 40.82000 120
## 128 1410.000 20.00 55.22000 140
## 129 360.000 23.00 62.42000 120
## 130 1568.000 32.00 67.64000 70
## 131 1184.000 40.00 68.72000 80
## 132 898.000 24.00 73.40000 70
## 133 436.000 0.00 86.36000 40
## 134 774.000 6.00 86.00000 20
## 135 1181.000 50.00 79.88000 17
## 136 1991.000 47.00 69.62000 40
## 137 2057.000 71.00 67.28000 50
## 138 1597.000 56.00 68.00000 50
## 139 1184.000 52.00 69.44000 70
## 140 3005.000 58.00 59.18000 80
## 141 2880.000 53.00 57.38000 120
## 142 2116.600 66.00 56.30000 120
## 143 2125.000 64.00 59.00000 100
## 144 3720.000 74.00 50.90000 120
## 145 4337.000 66.00 59.36000 200
## 146 2053.000 31.00 72.86000 120
## 147 1958.000 70.00 70.52000 40
## 148 3644.000 86.00 59.36000 70
## 149 1368.000 75.00 58.46000 100
## 150 3539.000 73.00 53.60000 120
## 151 2785.000 49.00 63.32000 100
## 152 984.000 26.00 69.26000 120
## 153 1804.000 56.00 68.00000 70
## 154 3234.000 77.00 62.78000 80
## 155 3441.000 67.00 60.98000 100
## 156 1578.000 61.00 60.80000 100
## 157 1850.000 76.00 60.80000 120
## 158 2962.000 80.00 59.36000 120
## 159 2670.000 54.00 55.40000 120
## 160 5000.000 76.00 42.08000 150
## 161 5000.000 57.00 40.82000 140
## 162 5000.000 46.00 51.26000 140
## 163 987.000 28.00 63.86000 140
## 164 1148.000 43.00 66.92000 140
## 165 898.000 -24.00 77.90000 60
## 166 777.000 -1.00 82.58000 30
## 167 1279.000 75.00 71.60000 17
## 168 1046.000 69.00 68.72000 80
## 169 1167.000 50.00 74.30000 60
## 170 987.000 45.00 75.74000 100
## 171 1144.000 57.00 71.24000 120
## 172 977.000 60.00 70.70000 150
## 173 770.000 26.00 75.56000 120
## 174 629.000 -11.00 86.36000 140
## 175 337.000 -14.00 89.78000 140
## 176 590.000 26.00 85.10000 120
## 177 400.000 19.00 83.30000 120
## 178 580.000 9.00 87.26000 80
## 179 646.000 25.00 89.24000 140
## 180 826.000 41.00 84.38000 140
## 181 823.000 52.00 74.48000 150
## 182 2116.000 47.00 70.34000 120
## 183 2972.000 52.00 64.40000 120
## 184 2752.000 41.00 69.98000 140
## 185 1377.000 37.00 78.44000 100
## 186 1486.000 33.00 79.88000 50
## 187 990.000 22.00 85.10000 40
## 188 508.000 29.00 85.28000 100
## 189 1204.000 56.00 79.88000 100
## 190 2414.000 63.00 76.46000 60
## 191 2385.000 67.00 70.34000 50
## 192 2326.000 64.00 71.24000 70
## 193 3389.000 56.00 68.72000 80
## 194 2818.000 58.00 71.78000 80
## 195 3083.000 75.00 72.32000 80
## 196 2394.000 54.00 69.62000 90
## 197 2746.000 61.00 69.44000 120
## 198 2493.000 55.00 72.50000 120
## 199 1528.000 42.00 73.94000 100
## 200 111.000 40.00 78.08000 60
## 201 1899.000 45.00 76.46000 40
## 202 1289.000 32.00 75.20000 40
## 203 984.000 35.00 78.80000 70
## 204 836.000 28.00 81.50000 80
## 205 826.000 27.00 79.34000 80
## 206 1105.000 39.00 74.12000 80
## 207 1023.000 46.00 77.18000 80
## 208 1453.000 68.00 70.16000 80
## 209 2375.000 52.00 66.20000 100
## 210 2956.000 46.00 67.28000 120
## 211 2988.000 56.00 65.66000 150
## 212 4291.000 60.00 62.24000 200
## 213 3330.000 59.00 58.64000 150
## 214 2085.200 31.00 63.27119 150
## 215 2292.951 35.00 76.70686 150
## 216 2085.200 49.00 63.27119 150
## 217 829.775 30.00 76.70686 100
## 218 2292.951 43.00 63.27119 100
## 219 1233.000 30.00 70.52000 100
## 220 1450.000 36.00 69.80000 30
## 221 1069.000 28.00 74.30000 80
## 222 984.000 57.00 73.40000 70
## 223 1653.000 71.00 68.72000 60
## 224 3930.000 68.00 59.18000 150
## 225 5000.000 73.00 51.62000 200
## 226 5000.000 45.00 51.26000 200
## 227 4212.000 46.00 56.84000 200
## 228 5000.000 52.00 49.82000 250
## 229 5000.000 31.00 57.38000 300
## 230 1545.000 43.00 65.66000 70
## 231 994.000 44.00 69.62000 300
## 232 1125.000 55.00 68.00000 150
## 233 636.000 16.00 73.94000 300
## 234 748.000 32.00 77.00000 30
## 235 692.000 44.00 77.72000 100
## 236 807.000 39.00 78.80000 100
## 237 869.000 19.00 78.98000 17
## 238 800.000 24.00 85.64000 20
## 239 393.000 6.00 91.76000 4
## 240 557.000 0.00 90.68000 70
## 241 620.000 27.00 85.64000 30
## 242 1404.000 33.00 84.74000 70
## 243 898.000 21.00 80.60000 60
## 244 377.000 -2.00 83.30000 40
## 245 528.000 17.00 78.80000 50
## 246 2818.000 26.00 72.68000 70
## 247 3247.000 10.00 67.28000 140
## 248 895.000 0.00 78.08000 100
## 249 721.000 0.00 80.24000 120
## 250 774.000 -27.00 75.56000 100
## 251 134.000 0.00 77.18000 70
## 252 5000.000 39.00 51.80000 150
## 253 1965.000 13.00 60.98000 50
## 254 1853.000 10.00 70.88000 70
## 255 2342.000 7.00 71.42000 40
## 256 5000.000 56.00 51.62000 70
## 257 5000.000 37.00 47.48000 120
## 258 4028.000 35.00 55.04000 140
## 259 2716.000 26.00 63.68000 140
## 260 3671.000 31.00 65.84000 100
## 261 3795.000 31.00 66.92000 70
## 262 3120.000 35.00 66.92000 40
## 263 2667.000 17.00 63.50000 100
## 264 5000.000 39.00 52.70000 120
## 265 5000.000 36.00 48.92000 120
## 266 308.000 25.00 68.72000 140
## 267 2982.000 18.00 59.90000 120
## 268 5000.000 30.00 52.70000 70
## 269 5000.000 25.00 51.26000 150
## 270 5000.000 38.00 47.66000 200
## 271 5000.000 35.00 47.84000 200
## 272 3070.000 13.00 60.08000 200
## 273 830.000 0.00 72.14000 70
## 274 711.000 -9.00 75.56000 40
## 275 1049.000 -10.00 78.98000 50
## 276 511.000 -39.00 83.84000 17
## 277 5000.000 -40.00 67.64000 80
## 278 5000.000 -34.00 69.44000 250
## 279 377.000 -3.00 78.80000 200
## 280 862.000 27.00 73.58000 2
## 281 337.000 -17.00 81.14000 20
## 282 492.000 -2.00 82.22000 7
## 283 1394.000 13.00 75.02000 30
## 284 3146.000 27.00 64.04000 50
## 285 2234.000 11.00 66.74000 70
## 286 2109.000 21.00 69.62000 17
## 287 5000.000 23.00 54.50000 80
## 288 2270.000 -7.00 68.90000 50
## 289 2191.000 -13.00 68.72000 60
## 290 3448.000 12.00 58.64000 60
## 291 5000.000 13.00 48.92000 80
## 292 5000.000 11.00 49.46000 50
## 293 2719.000 25.00 56.84000 50
## 294 1899.000 21.00 62.06000 40
## 295 5000.000 -41.00 52.52000 40
## 296 5000.000 -21.00 50.90000 300
## 297 5000.000 -19.00 54.32000 200
## 298 2385.000 10.00 60.44000 150
## 299 1938.000 0.00 62.60000 100
## 300 590.000 -11.00 69.98000 100
## 301 2085.200 -32.00 62.58364 60
## 302 2085.200 -42.00 62.58364 150
## 303 328.000 -40.00 80.60000 150
## 304 2085.200 -43.00 62.58364 200
## 305 5000.000 -29.00 61.70000 300
## 306 829.775 -9.00 76.70686 120
## 307 597.000 -22.00 73.58000 30
## 308 2085.200 -25.00 62.58364 100
## 309 469.000 -4.00 71.78000 50
## 310 1541.000 18.00 63.14000 20
## 311 5000.000 24.00 41.90000 200
## 312 5000.000 15.00 41.72000 120
## 313 5000.000 27.00 44.60000 300
## 314 5000.000 54.00 42.80000 200
## 315 5000.000 -28.00 53.60000 70
## 316 5000.000 -38.00 63.50000 140
## 317 5000.000 -36.00 60.08000 150
## 318 2014.000 -20.00 69.98000 200
## 319 436.000 1.00 70.34000 4
## 320 830.000 3.00 66.02000 40
## 321 1112.000 -8.00 66.38000 30
## 322 1210.000 -17.00 67.82000 30
## 323 501.000 -22.00 70.88000 2
## 324 875.000 -15.00 68.90000 0
## 325 1601.000 7.00 62.06000 30
## 326 5000.000 59.00 41.90000 60
## 327 5000.000 -63.00 37.04000 150
## 328 5000.000 -52.00 41.00000 100
## 329 2280.000 -54.00 55.76000 250
## 330 2047.000 -43.00 63.50000 150
## 331 5000.000 -69.00 56.48000 200
## 332 3720.000 -50.00 61.34000 200
## 333 311.000 -24.00 69.98000 200
## 334 2536.000 28.00 56.48000 80
## 335 1154.000 -22.00 61.52000 60
## 336 2933.000 -40.00 59.90000 300
## 337 3064.000 -33.00 62.78000 200
## 338 826.000 -16.00 64.76000 300
## 339 5000.000 2.00 42.98000 50
## 340 111.000 -52.00 68.18000 40
## 341 5000.000 -48.00 54.68000 70
## 342 5000.000 -37.00 55.58000 150
## 343 5000.000 -26.00 51.08000 150
## 344 5000.000 -31.00 51.44000 70
## 345 948.000 -48.00 70.70000 200
## 346 5000.000 -50.00 50.90000 120
## 347 5000.000 -22.00 48.56000 150
## 348 3687.000 -10.00 46.94000 150
## 349 5000.000 0.00 44.24000 60
## 350 5000.000 -19.00 45.68000 70
## 351 5000.000 -28.00 45.32000 150
## 352 5000.000 -25.00 48.38000 300
## 353 508.000 -10.00 58.64000 100
## 354 2851.000 -5.00 50.00000 70
## 355 4106.941 -35.00 52.84870 40
## 356 3143.000 -37.22 60.26000 140
## 357 111.000 -14.00 72.50000 200
## 358 5000.000 -36.00 51.26000 70
## 359 3704.000 18.00 46.94000 40
## 360 5000.000 8.00 39.92000 100
## 361 5000.000 -3.00 37.22000 70
##
## $desc
## Imputation description
## Target:
## Features: 13; Imputed: 13
## impute.new.levels: TRUE
## recode.factor.levels: TRUE
## dummy.type: factor
We first use the imputeLearner() function to define what algorithm we’re going to use to impute the missing values. The only argument we supply to this function is the name of the learner, which in this case is 'regr.rpart'.
Next, we use the impute() function to create the imputed data set, to which the first argument is the data. We’ve wrapper our tibble inside the as.data.frame() function just to prevent repeated warnings about the data being a tibble & not a data frame (these can be safely ignored). We can specify different imputation techniques for different columns by supplying a named list to the cols argument. For example, we could say cols = list(var1 = imputeMean(), var2 = imputeLearner('regr.lm')). We can also specify different imputation techniques for different classes of variable (one technique for numeric variables, another for factors) using the classes argument in the same way. In the following listing, we use the classes argument to impute all the variables (they are all numeric) using the imputeMethod we defined.
This results in a data set we can access using ozoneImp$data, whose missing values have been replaced with predictions from a model learned by the rpart algorithm. Now we can define our task & learner using the imputed data set. By supplying 'regr.lm' as an argument to the makeLearner() function, we’re telling mlr that we want to use linear regression.
ozoneTask <- makeRegrTask(data = ozoneImp$data, target = 'Ozone')
ozoneTask
## Supervised task: ozoneImp$data
## Type: regr
## Target: Ozone
## Observations: 361
## Features:
## numerics factors ordered functionals
## 12 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
lin <- makeLearner('regr.lm')
lin
## Learner regr.lm from package stats
## Type: regr
## Name: Simple Linear Regression; Short name: lm
## Class: regr.lm
## Properties: numerics,factors,se,weights
## Predict-Type: response
## Hyperparameters:
Sometimes, it may be obvious which variables have no predictive value & can be removed from the analysis. Domain knowledge is also very important here, where we include variables in the model that we, as experts, believe to have some predictive value for the outcome we’re studying. But it’s often better to take a less subjective approach to feature selection, & allow an algorithmm to choose the relevant features for us.
There are two methods for automating feature selection:
Which method do we choose? It boils down to this: wrapper methods may result in models that perform better, because we’re actually using the model we’re training to estimate predictor importance. However, because we’re training a fresh model at each iteration of the selection process (& each step may include other preprocessing steps such as imputation), wrapper methods tend to be computationally expensive. Filter methods, on the other hand, may or may not select the best-performing set of predictors but are much less computationally expensive.
The Filter Method for Feature Selection
We’ll start with an example of using the filter method. There are a number of metrics we can use to estimate predictor importance. To see the list of available filter methods built into mlr, run listFilterMethods(). Some of the filter methods will require you to first install the FSelector package. There are too many filter methods to describe in full, but common choices include:
The default method used by mlr (because it doesn’t depend on whether the variables are categorical or continuous) is to build a random forest to predict the outcome & return the variables that contribute most to model predictions (using the out-of-bag error). In our ozone example, because both the predictors & outcome variable are continuous, we’ll use linear correlation to estimate variable importance.
First, we use the generateFilterValuesData() function to generate an importance metric for each predictor. The first argument is the task, which contains our dataset & lets the function know that Ozone is the target variable. The second, optional argument is method, to which we supply one of the methods listed by listFilterMethods(). In this example, we use 'linear.correlation'. By extracting the $data component of this object, we get the table of predictors with their Pearson correlation coefficients.
We’ll also plot the coefficients for easier interpretability with plotFilterValues().
filterVals <- generateFilterValuesData(ozoneTask,
method = 'linear.correlation')
filterVals
## FilterValues:
## Task: ozoneImp$data
## name type filter value
## 1: Temp_Sand numeric linear.correlation 0.76977674
## 2: Temp_Monte numeric linear.correlation 0.74159034
## 3: Inv_temp numeric linear.correlation 0.72712691
## 4: Press_height numeric linear.correlation 0.58752354
## 5: Inv_height numeric linear.correlation 0.57563391
## 6: Humid numeric linear.correlation 0.45148094
## 7: Visib numeric linear.correlation 0.41471463
## 8: Press_grad numeric linear.correlation 0.23331823
## 9: Date numeric linear.correlation 0.08205094
## 10: Month numeric linear.correlation 0.05371420
## 11: Day numeric linear.correlation 0.04151444
## 12: Wind numeric linear.correlation 0.00468138
plotFilterValues(filterVals) + theme_bw()
Now that we have a way of ranking our predictors in order of their estimated importance, we can decinde how to ‘skim off’ our least informative ones. We do this using the filterFeatures() function, which takes the task as the first argument, our filterVals object as the fval argument, & either the abs, per, or threshold argument. The abs argument allows us to specify the absolute number of best predictors to retain. The per argument allows us to specify a top percentage of best predictors to retain. The threshold argument allows us to specify a value of our filtering metric (in this case, correlation coefficient) that a predictor must exceed in order to be retained. We could manually filter out predictors using one of these three methods, but we could also have the computer learn the best predictors to retain. We can wrap together our learner (linear regression) & filter method so that we can treat any of abs, per, & threshold as hyperparameters & tune them.
To wrap together our learner & filter method, we use the makeFilterWrapper() function, supplying the linear regression learner we defined as the learner argument & our filter metric as the fw.method argument.
filterWrapper <- makeFilterWrapper(learner = lin,
fw.method = 'linear.correlation')
When we wrap together a learner & a preprocessing step, the hyperparameters for both become available for tuning as part of our wrapper learner. In this situation, it means we can tune the abs, per, or threshold parameter using cross-validation, to select the best-performing features. In this example, we’re going to tune the absolute number of features to retain.
First, we define the hyperparameter space with makeParamSet() & define fw.abs as an integer between 1 & 12 (the minimum & maximum number of features we’re going to retain). Next, we define our grid search, using makeTuneControlGrid(). This will try every value of our hyperparameter. We define an ordinary 10-fold cross-validation strategy using makeResampleDesc() & then perform the tuning with tuneParams(). The first argument is our wrapper learner, & then we supply our task, cross-validation method, hyperparameter space, & search procedure.
lmParamSpace <- makeParamSet(
makeIntegerParam('fw.abs', lower = 1, upper = 12)
)
gridSearch <- makeTuneControlGrid()
kFold <- makeResampleDesc('CV', iters = 10)
tunedFeats <- tuneParams(filterWrapper, task = ozoneTask, resampling = kFold,
par.set = lmParamSpace, control = gridSearch)
tunedFeats
## Tune result:
## Op. pars: fw.abs=10
## mse.test.mean=20.6510640
Our tuning procedure picks the 10 predictors with the highest correlation with ozone as the best-performing combination. But what does mse.test.mean mean? Well, the performance metrics we used for classification, such as mean misclassification error, doesn’t make any sense when predicting continuous variables. For regression problems, there are three commonly used performance metrics:
Using the MSE performance metric, our tuned filter method has concluded that retaining the 10 features with the highest correlation with the ozone level results in the best-performing model. We can now train a final model that include only these top 10 features in the task.
First, we create a new task that includes only the filtered features, using the filterFeatures() function. To this function, we supply the name of the existing task, the filterVals object, & the number of features to retain as the argument to abs. This value can be accessed as the $x component of tunedFeats & needs to be wrapped in unlist(); otherwise, the function will throw an error. This creates a new task that contains only the filtered predictors & retains Ozone as the target variable. Finally, we train the linear model using this task.
filteredTask <- filterFeatures(ozoneTask, fval = filterVals,
abs = unlist(tunedFeats$x))
filteredTask
## Supervised task: ozoneImp$data
## Type: regr
## Target: Ozone
## Observations: 361
## Features:
## numerics factors ordered functionals
## 10 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
filteredModel <- train(lin, filteredTask)
filteredModel
## Model for learner.id=regr.lm; learner.class=regr.lm
## Trained on: task.id = ozoneImp$data; obs = 361; features = 10
## Hyperparameters:
The Wrapper Method for Feature Selection
With the filter method, we generate univariate statistics describing how each predictor relates to the outcome variable. This may result in selecting the most informative predictors, but it isn’t guaranteed. Instead, we can use the actual model we’re trying to train to determine which features help it make the best predictions, but it is computationally more expensive as we’re training a fresh model for every permutation of predictor variables.
There are four options to search for the best combination of predictors.
With so many options to choose from, where do we start? Exhaustive & genetic search prohibitively slow for a large feature space. While random search can alleviate this problem, sequential search is a good compromise between computational cost & probability of finding the best-performing feature combination. For this example, we’re going to use floating backward selection.
First, we define the search method usign the makeFeatSelControlSequential() function. We use 'sfbs' as the method argument to use a sequential floating backward selection. Then we use the selectFeatures() function to perform feature selection. To this function, we supply the learner, task, cross-validation strategy, & search method. When we run the function, every permutation of predictor variables is cross-validated using our kFold strategy to get an estimate of its performance.
featSelControl <- makeFeatSelControlSequential(method = 'sfbs')
selFeats <- selectFeatures(learner = lin, task = ozoneTask, resampling = kFold, control = featSelControl)
selFeats
## FeatSel result:
## Features (6): Month, Press_height, Humid, Temp_Sand, Temp_Mon...
## mse.test.mean=20.3337104
By printing the result of this process, we can see the algorithm selected six predictors that had slightly lower MSE value than the predictors selected in our filter method.
Now, just as we did for the filter method, we can create a new task using the imputed data that contains only those selected predictors, & train a model on it.
ozoneSelFeat <- ozoneImp$data[, c('Ozone', selFeats$x)]
ozoneSelFeatTask <- makeRegrTask(data = ozoneSelFeat, target = 'Ozone')
wrapperModel <- train(lin, ozoneSelFeatTask)
wrapperModel
## Model for learner.id=regr.lm; learner.class=regr.lm
## Trained on: task.id = ozoneSelFeat; obs = 361; features = 6
## Hyperparameters:
It’s been said many times before, but always include data-dependent pre-processing steps in your cross-validation. However, up this this point, we’ve only needed to consider a single pre-processing step. How do combine more than one? mlr makes this simple. When we wrap together a learner & a pre-processing step, we essentially create a new learner algorithm that includes that pre-processing. So, to include an additional pre-processing step, we simply wrap the wrapped learner. The result is a sort of Matryoshka doll of wrappers, where one is encapsulated by another, which is encapsulated by another, & so on.
Using this strategy, we can combine as many preprocessing steps as we like to create a pipeline. The innermost wrapper will always be used first, then the next innermost, & so on.
We’re going to make an impute wrapper & then pass it as the learner to a feature-selection wrapper. First, we redefine our imputation method using the imputeLearner() function. Then, we create an imputation wrapper using the makeImputeWrapper() function, which takes the learner as the first argument. We use list(numeric = imputeMethod) as the classes argument to apply this imputation strategy to all of our numeric predictors.
Next comes the neat bit: we create a feature-selection wrapper using makeFeatSelWrapper() & supply the imputation wrapper we created as the learner. This is a crucial step because we’re creating a wrapper with another wrapper. We set the cross-validation method as kFold & the method of searching feature combinations as featSelControl.
imputeMethod <- imputeLearner('regr.rpart')
imputeWrapper <- makeImputeWrapper(lin,
classes = list(numeric = imputeMethod))
featSelWrapper <- makeFeatSelWrapper(learner = imputeWrapper,
resampling = kFold,
control = featSelControl)
featSelWrapper
## Learner regr.lm.imputed.featsel from package stats
## Type: regr
## Name: ; Short name:
## Class: FeatSelWrapper
## Properties: numerics,factors,missings,weights,se
## Predict-Type: response
## Hyperparameters:
Now, let’s cross-validate our entire model-building process like good data scientists. We define a task using the ozoneClean tibble, which still contains missing data. Next, we define an ordinary 3-fold cross-validation strategy for our cross-validation procedure. Finally, we start parallelisation with parallelStartSocket() & start the cross-validation procedure by supplying the learner (the wrapped wrapper), task, & cross-validation strategy to the resample() function.
ozoneTaskWithNAs <- makeRegrTask(data = ozoneClean, target = 'Ozone')
kFold3 <- makeResampleDesc('CV', iters = 3)
parallelStartSocket(cpus = detectCores() - 1)
lmCV <- resample(featSelWrapper, ozoneTaskWithNAs, resampling = kFold3)
parallelStop()
lmCV
## Resample Result
## Task: ozoneClean
## Learner: regr.lm.imputed.featsel
## Aggr perf: mse.test.mean=21.5519674
## Runtime: 82.6973
The cross-validation process proceeds like this:
We can see that our model-building process gives us a mean MSE of approximately 21, suggesting a mean residual error of about 4.6 on the original ozone scale (taking the square root of 21).
Due to their simple structure, linear models are usually quite simple to interpret, because we can look at the slopes for each predictor to infer how much the outcome variable is affected by each. However, whether these interpretations are justified or not depends on whether some model assumptions have been met, so we’ll interpret the model output & generate some diagnostic plots.
First, we need to extract the model information from our model object using the getLearnerModel() function. By calling summary() on the model data, we get an output with lots of information about our model.
wrapperModelData <- getLearnerModel(wrapperModel)
summary(wrapperModelData)
##
## Call:
## stats::lm(formula = f, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1285 -3.0253 -0.4689 2.6848 13.7779
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.0320707 26.9086028 1.116 0.265146
## Month -0.2606420 0.0753338 -3.460 0.000607 ***
## Press_height -0.0084528 0.0050197 -1.684 0.093080 .
## Humid 0.0815839 0.0137726 5.924 7.46e-09 ***
## Temp_Sand 0.2167663 0.0430415 5.036 7.58e-07 ***
## Temp_Monte 0.2656316 0.0633190 4.195 3.45e-05 ***
## Inv_height -0.0005736 0.0001762 -3.255 0.001243 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.471 on 354 degrees of freedom
## Multiple R-squared: 0.6863, Adjusted R-squared: 0.681
## F-statistic: 129.1 on 6 and 354 DF, p-value: < 2.2e-16
The Call component would normally tell us the formula we used to create the model (which variables & whether we added more complex relationships between them). Because we built this model using mlr, we unfortunately don’t get that information here, but the model formula is all of the selected predictors combined linearly together.
The Residuals component gives us some summary statistics about the model residuals. Here, we’re looking to see if the median is approximately 0 & that the first & third quartiles are approximately the same. If they aren’t, this might suggest the residuals are either not normally distributed, or heterescedastic. In both situations, not only could this negatively impact model performance, but it could make our interpretation of the slopes incorrect.
The Coefficients component shows us a table of model parameters & their standard errors. The intercept is the estimate of the ozone level when all other variables are 0. In this particular case, it doesn’t make sense for some of our variables to be 0 (month, for example) so we don’t draw too much interpretation from this. The estimates for the predictors are their slopes. For example, our model estimates than for every one-unit increase in the Temp_Sand variable, Ozone increases by about 0.2 (holding all other variables constant). The Pr(>|t|) column contains the p values that, in theory, represent the probability of seeing a slope this large if the population slope was actually 0. Use the p values to guide your model building process, by all means; but there are some problems associated with p values, so don’t put too much faith in them.
Finally, Residual standard error is the same as RMSE, Multiple R-squared is an estimate of the proportion of variance in the data accounted for by our model, & the F-statistic is the ratio of variance explained by our model to the variance not explained by the model. The p value here is an estimate of the probability that our model is better than just using the mean of Ozone to make predictions.
We can very quickly & easily print diagnostic plots for linear models in R by supplying the model data as the argument to plot(). We split the plotting device into four parts using the mfrow argument to the par() function. This means when we create our diagnostic plots (there will be four of them), they will be tiled in the same plotting window. These plots may help us identify flaws in our model that impact predictive performance.
par(mfrow = c(2, 2))
plot(wrapperModelData)
par(mfrow = c(1, 1))